In this notebook, we will illustrate how to extract topological features from tumors in the lungs. We start by setting the working directory and importing the required libraries.
# Set working directory (change accordingly)
workdir = "/home/robin/Documents/Stanford_VSR/NMI/TDA_Lung_Histology"
import os
import sys
os.chdir(workdir)
sys.path.insert(0, os.path.join(workdir, "Functions"))
# Handling arrays
import numpy as np
# Handling dicom images
import SimpleITK as sitk
# Persistent homology and topological feature extraction
import TDAfeatures as tf
from ripser import ripser
from persim import plot_diagrams
# Plotting tumor scans, segmentations, and surfaces
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import plotly
import plotly.graph_objs as go
# Tracking computation times
import time
We load an example scan/tumor from which we will obtain topological features.
# load the radiology image
vol = tf.read_dcm(os.path.join("Scan", "Dicom"))
lungs_img = sitk.GetArrayFromImage(vol)
# load the segmented tumor
lesion_mask = tf.slice_and_stack(os.path.join("Scan", "mask.png"))
bbox = tf.bbox_3D(lesion_mask)
# highlight the lesion on the lung CT scan
lungs_img_with_lesion = lungs_img.copy()
lungs_img_with_lesion[lesion_mask != 0] = np.max(lungs_img)
# identify the largest slice for visualization
current_tumor_pixels = -1
for idx in range(lesion_mask.shape[0]):
tumor_pixels_at_idx = np.sum(lesion_mask[idx,:,:] == 255)
if tumor_pixels_at_idx > current_tumor_pixels:
current_tumor_pixels = tumor_pixels_at_idx
tumor_idx = idx
# plot a slice of the scan and the corresponding tumor segmentation
fig, ax = plt.subplots(figsize=(3, 3))
ax.imshow(lungs_img_with_lesion[tumor_idx,:,:], cmap=plt.cm.bone)
ax.add_patch(patches.Rectangle((bbox[4], bbox[2]), bbox[5] - bbox[4], bbox[3] - bbox[2],
linewidth=2, edgecolor="r", facecolor="none"))
fig.tight_layout()
plt.show()
There are various ways to compute persistent homology from segmented tumors in CT scans. The first we will show is when the filtration, i.e., the sequence of simplicial complexes ordered by inclusion (from which we subsequently compute persistent homology) is obtained from the direct values of the segmented image pixels in the CT scan.
# extract the tumor pixels from the CT scan
lesion_img = tf.segment_3Darray(lesion_mask, lungs_img)
# plot a slice of the segmentation pixels
fig, ax = plt.subplots(figsize=(3, 3))
ax.imshow(lesion_img[tumor_idx - tf.bbox_3D(lesion_mask)[0],:,:], cmap=plt.cm.bone)
ax.set_axis_off()
plt.show()
We now use the intensity of the tumor pixels in the segmentation to construct the filtration. This is visuallized as follows.
ts = [-600, -400, -325, 125]
not_filled = np.ones_like(lesion_img)
filled = np.zeros_like(lesion_img)
colors = plt.cm.bone((lesion_img - np.nanmin(lesion_img)) / (np.nanmax(lesion_img) - np.nanmin(lesion_img)))
fig = plt.figure(figsize=(8, 8))
for idx, t in enumerate(ts):
to_fill = lesion_img <= t
filled[to_fill] = 1
not_filled[to_fill] = 0
ax = fig.add_subplot(2, 2, idx + 1, projection="3d")
ax.grid(False)
ax.voxels(filled, facecolors=colors, edgecolors="black", shade=False, linewidth=0.2)
ax.axes.xaxis.set_ticklabels([])
ax.axes.yaxis.set_ticklabels([])
ax.axes.zaxis.set_ticklabels([])
plt.gca().patch.set_facecolor("white")
ax.w_xaxis.set_pane_color((0.8, 0.8, 0.8, 1.0))
ax.w_yaxis.set_pane_color((0.8, 0.8, 0.8, 1.0))
ax.w_zaxis.set_pane_color((0.8, 0.8, 0.8, 1.0))
ax.set_title("t = {}".format(t), fontsize=15)
plt.show()
Persistent homology now computes the birth-time and death-times of 'holes' accross such filtrations. These holes can be distinguished by their dimension: components (H0), cycles (H1), and voids (H2). For our example filtration above, it can be computed from the image pixels as follows.
start_time = time.time()
dgms_img = tf.lower_star_img3D(lesion_img)
elapsed_time = time.time() - start_time
print("Time for computing persistence: " + time.strftime("%H:%M:%S", time.gmtime(elapsed_time)))
Time for computing persistence: 00:00:02
Persistence diagrams can be used to visualize the results of our persistent homology computation. For each hole that was born at time b and that died at time d, it marks the point (b, d) above the first diagonal in the Euclidean plane. Note that d is possibly infinite, which corresponds to a hole that never dies. E.g., when the image includes at least one pixel, it will also include at least one connected component that never dies. Since holes are distinguished by their dimension, so are the persistence diagrams. We can visualize them as follows.
fig, ax = plt.subplots(figsize=(4, 4))
plot_diagrams(dgms_img, size=20, ax=ax)
for item in [ax.xaxis.label, ax.yaxis.label] + ax.get_xticklabels() + ax.get_yticklabels():
item.set_fontsize(15)
for item in ax.get_xticklabels() + ax.get_yticklabels():
item.set_fontsize(12)
plt.legend(prop={"size":15})
plt.show()
We can also quantify the topological information of our tumor when including surrounding tissue information. The procedure for this is exactly the same as above, only we include all pixels of a boundary box of the segmentation in the filtration.
# extract the tumor pixels from the CT scan
lesion_img_box = tf.segment_3Darray(lesion_mask, lungs_img, fillnan=False)
# plot a slice of the segmentation pixels
fig, ax = plt.subplots(figsize=(3, 3))
ax.imshow(lesion_img_box[tumor_idx - tf.bbox_3D(lesion_mask)[0],:,:], cmap=plt.cm.bone)
ax.set_axis_off()
plt.show()
start_time = time.time()
dgms_img_box = tf.lower_star_img3D(lesion_img_box)
elapsed_time = time.time() - start_time
print("Time for computing persistence: " + time.strftime("%H:%M:%S", time.gmtime(elapsed_time)))
fig, ax = plt.subplots(figsize=(4, 4))
plot_diagrams(dgms_img_box, size=20, ax=ax)
for item in [ax.xaxis.label, ax.yaxis.label] + ax.get_xticklabels() + ax.get_yticklabels():
item.set_fontsize(15)
for item in ax.get_xticklabels() + ax.get_yticklabels():
item.set_fontsize(12)
plt.legend(prop={"size":15})
plt.show()
Time for computing persistence: 00:00:02
Finally, we can also quantify the topological information of the tumor, through the vertices of its surface mesh. While the diagrams above mainly quantify textural properties of the tumor, more global shape properties are quantified through the points on its surface.
verts, faces = tf.mesh_from_lesion(vol, lesion_mask)
print("Number of points on surface: " + str(verts.shape[0]))
# configure Plotly to be rendered inline in the notebook
plotly.offline.init_notebook_mode()
# configure the trace
trace = go.Scatter3d(
x = verts[:,0],
y = verts[:,1],
z = verts[:,2],
mode = "markers",
marker={
"size": .5,
"opacity": 0.8,
}
)
# configure the layout
layout = go.Layout(
margin={"l": 0, "r": 0, "b": 0, "t": 0},
width = 500,
height = 500
)
data = [trace]
plot_figure = go.Figure(data=data, layout=layout)
# Render the plot.
plotly.offline.iplot(plot_figure)
Number of points on surface: 7258
The point clouds generally too large for computing 2-dimensional persistent homology efficiently. We can overcome this issue by using a theoretically justified approximation algorithm for computing our diagrams using a set number of landmarks. This procedure is implemented in the Ripser library: https://pypi.org/project/ripser/.
n_landmarks = 500
start_time = time.time()
dgms_pc = ripser(verts, maxdim=2, n_perm=n_landmarks)["dgms"]
elapsed_time = time.time() - start_time
print("Time for computing persistence: " + time.strftime("%H:%M:%S", time.gmtime(elapsed_time)))
fig, ax = plt.subplots(figsize=(4, 4))
plot_diagrams(dgms_pc, size=20, ax=ax)
for item in [ax.xaxis.label, ax.yaxis.label] + ax.get_xticklabels() + ax.get_yticklabels():
item.set_fontsize(15)
for item in ax.get_xticklabels() + ax.get_yticklabels():
item.set_fontsize(12)
plt.legend(prop={"size":15})
plt.show()
Time for computing persistence: 00:00:25
Many methods to learn from persistence diagrams have been developed, and are currently being researched. For further exploration, we highly recommend taking a look at the GUDHI library for many procedures compatible with sklearn: http://gudhi.gforge.inria.fr/python/latest/representations.html.
We currently consider a very simple fixed size vectorization of each persistence diagram (one per type of filtration and dimension of hole), which can be conducted as follows.
tf.persistence_statistics(dgms_img[0])
{'min_birth': -618.0,
'no_infinite_lifespans': 1,
'no_finite_lifespans': 504,
'mean_finite_midlifes': -151.84424603174602,
'mean_finite_lifespans': 68.45833333333333,
'std_finite_midlifes': 163.15176997586306,
'std_finite_lifespans': 82.98655376118911,
'skew_finite_midlifes': 0.0215852031737228,
'skew_finite_lifespans': 1.9510321391227285,
'kurtosis_finite_midlifes': -1.2245754539761748,
'kurtosis_finite_lifespans': 4.345181696510332,
'median_finite_midlifes': -178.0,
'median_finite_lifespans': 39.0,
'Q1_finite_midlifes': -285.125,
'Q1_finite_lifespans': 10.0,
'Q3_finite_midlifes': 22.5,
'Q3_finite_lifespans': 94.5,
'IQR_finite_midlifes': 307.625,
'IQR_finite_lifespans': 84.5,
'entropy_finite_lifespans': 5.617095177149606}